Web Catchphrase Improve System Employing Onomatopoeia and Large-Scale N-gram Corpus

نویسندگان

  • Hiroaki Yamane
  • Masafumi Hagiwara
چکیده

In this paper, we propose a system which improves text catchphrases on the web using onomatopoeia and the Japanese Google N-grams. Onomatopoeia is regarded as a fundamental tool in daily communication for people. The proposed system inserts an onomatopoetic word into plain text catchphrases. Being based on a large catchphrase encyclopedia, the proposed system evaluates each catchphrase’s candidates considering the words, structure and usage of onomatopoeia. That is, candidates are selected whether they contain onomatopoeia and they use specific catchphrase grammatical structures. Subjective experiments show that inserted onomatopoeia is effective for making attractive catchphrases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information

Web provides a large-scale corpus for researchers to study the language usages in real world. Developing a web-scale corpus needs not only a lot of computation resources, but also great efforts to handle the large variations in the web texts, such as character encoding in processing Chinese web texts. In this paper, we aim to develop a web-scale Chinese word N-gram corpus with parts of speech i...

متن کامل

N-gram Counts and Language Models from the Common Crawl

We contribute 5-gram counts and language models trained on the Common Crawl corpus, a collection over 9 billion web pages. This release improves upon the Google n-gram counts in two key ways: the inclusion of low-count entries and deduplication to reduce boilerplate. By preserving singletons, we were able to use Kneser-Ney smoothing to build large language models. This paper describes how the c...

متن کامل

Product Recommendation Method based on Onomatopoeia Expressing Texture

With the widespread use of online shopping in recent years, consumer search requests for products have become more diverse. Previous web search methods have used adjectives as input by consumers. However, given that the number of adjectives that can be used to express textures is limited, it is debatable whether adjectives are capable of richly expressing variations of product textures. In Japa...

متن کامل

Search right and thou shalt find ... Using Web Queries for Learner Error Detection

We investigate the use of web search queries for detecting errors in non-native writing. Distinguishing a correct sequence of words from a sequence with a learner error is a baseline task that any error detection and correction system needs to address. Using a large corpus of error-annotated learner data, we investigate whether web search result counts can be used to distinguish correct from in...

متن کامل

Chinese Web Scale Linguistic Datasets and Toolkit

The web provides a huge collection of web pages for researchers to study natural languages. However, processing web scale texts is not an easy task and needs many computational and linguistic resources. In this paper, we introduce two Chinese parts-of-speech tagged web-scale datasets and describe tools that make them easy to use for NLP applications. The first is a Chinese segmented and POS-tag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Fuzzy Logic and Intelligent Systems

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2012